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Pooling specimens, a well-accepted sampling strategy in biomedi- 
cal research, can be applied to reduce the cost of studying biomarkers. 
Even if the cost of a single assay is not a major restriction in eval- 
uating biomarkers, pooling can be a powerful design that increases 
the efficiency of estimation based on data that is censored due to 
an instrument's lower limit of detection (LLOD). However, there are 
situations when the pooling design strongly aggravates the detection 
limit problem. To combine the benefits of pooled assays and indi- 
vidual assays, hybrid designs that involve taking a sample of both 
pooled and individual specimens have been proposed. We examine 
the efficiency of these hybrid designs in estimating parameters of 
two systems subject to a LLOD: (1) normally distributed biomarker 
with normally distributed measurement error and pooling error; (2) 
Gamma distributed biomarker with double exponentially distributed 
measurement error and pooling error. Three-assay design and two- 
assay design with replicates are applied to estimate the measurement 
and pooling error. The Maximum likelihood method is used to es- 
timate the parameters. We found that the simple one-pool design, 
where all assays but one are random individuals and a single pooled 
assay includes the remaining specimens, under plausible conditions, 
is very efficient and can be recommended for practical use. 
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1. Introduction. Epidemiological studies frequently investigate the rela- 
tionship between biomarkers and disease. In such studies, assaying specimens 
for biomarkers can be expensive. For example, a single assay to measure 
polychlorinated biphenyl (PCB) costs between $500 and $1,000 [Louis et al. 

(2005) ]. The high cost severely constrains the number of assays that can 
be performed in a study, thereby limiting the study's ability to characterize 
a biomarker-disease association. 

Two study designs, the pooling design and the simple random sampling 
design, have been proposed to reduce total assaying cost. Pooling involves as- 
saying only pooled, that is, physically mixed, specimens [Sham et al. (2002)]. 
Each pooled specimen is obtained by mixing pooling group size p individ- 
ual specimens together, and each pooled specimen is assumed to contain 
an amount of biomarker that is the mean of the amounts contained in 
its constituent individual specimens [Vexler, Liu and Schisterman (2006), 
Faraggi, Reiser and Schisterman (2003), Schisterman et al. (2001, 2005), 
Vexler et al. (2008)]. Simple random sampling involves assaying only a sim- 
ple random sample of individual specimens [Dorfman (1943), Liu and Schis- 
terman (2003), Liu, Schisterman and Teoh (2004), Vexler, Schisterman and 
Liu (2008), Weinberg and Umbach (1999), Zhang and Gant (2005)]. 

Not only does cost hinder the characterization of a biomarker-disease as- 
sociation, instrument sensitivity does as well. An instrument may be unable 
to detect an amount of biomarker below a certain level, the lower limit 
of detection (LLOD) [Vexler, Liu and Schisterman (2006), Mumford et al. 

(2006) , Vexler et al. (2008), Schisterman et al. (2006)]. Biomarker values 
above the LLOD are numerically determined, but values below the LLOD 
are censored. Because instrument sensitivity is an important issue in many 
areas such as occupational medicine and epidemiology, LLOD issues have 
been extensively dealt with in the biostatistical literature [Schisterman et al. 
(2006), Richardson and Ciampi (2003)]. 

Investigations of the efficiencies of pooling and simple random sampling 
in parameter estimation when data are subject to a LLOD have been per- 
formed. Mumford et al. (2006) and Vexler, Liu and Schisterman (2006) 
showed that, in the context of biomarker mean and variance estimation, 
there is always an interval of LLOD values for which pooling is more effi- 
cient than simple random sampling and sometimes even more efficient than 
assaying each and every individual specimen. This phenomenon can be ex- 
plained by the fact that, when a LLOD is below the mean of a biomarker 
distribution, a pooled assay has a greater chance of being above the LLOD 
than an individual assay [Schisterman and Vexler (2008)]. Mumford et al. 
(2006) also showed that pooling is more efficient than simple random sam- 
pling at estimating the area under the receiver operating characteristic curve 
(AUG) when the LLOD affects less than 50% of the data. However, when 
the LLOD is substantially greater than the mean of the biomarker distri- 
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bution, the pooling design is less efficient than simple random sampling at 
estimating the AUG. Furthermore, the reconstruction of individual assays' 
characteristics from pooled data is generally a complex issue [Vexler, Schis- 
terman and Liu (2008)]. 

The merits of pooling and simple random sampling led to the consider- 
ation of hybrid designs, which combine pooling and simple random sam- 
pling. Some randomly sampled individual specimens are each assayed, and 
the remaining assays are pooled assays. The efficiency of hybrid designs at 
parameter estimation has been considered when data are not affected by 
a LLOD [Schisterman et al. (2010)]. The present article extends previous 
work by examining the efficiency of a variety of hybrid designs at estimating 
biomarker distribution parameters and any assaying errors, when assays are 
affected by a LLOD. When LLOD is present, ignoring missing or replacing 
missing with a value might lead to severe bias. So it is important to extend 
our previous work by including LLOD. Furthermore, we demonstrate some 
hybrid designs under different situations in this article. We consider the ef- 
ficiency of hybrid designs under various combinations of pooling error and 
measurement error. Particularly, we are interested in a special case of the 
general hybrid design, which we call the one-pool design, where all assays 
but one are random individuals and a single pooled assay includes the re- 
maining specimens. This one-pool design is easy to execute in practice. Our 
approaches can apply to the upper limit detection (ULOD) as well. 

In the following sections we examine the efficiencies of hybrid designs when 
data are subject to various errors and LLOD. Three-assay design and two- 
assay design with replicates are applied to account for the pooling error and 
measurement error. Three-assay design combines one individual sampling 
group and two pooling groups with different pooling size; while the two- 
assay design with replicates combines an individual sampling group and one 
pooling group where each group is measured in replicate. Both designs can 
be used to estimate the parameters of the biomarker, measurement error and 
pooling error. The variances of parameters are evaluated for both normally 
and Gamma distributed biomarker levels. Last, we apply hybrid design to 
two cases: (1) normally distributed data on cholesterol, a coronary heart dis- 
ease biomarker and (2) Gamma distributed data on a chemokine biomarker 
with double exponentially distributed measurement error and pooling error. 

2. Pooled-unpooled hybrid design subject to a LLOD. In this section 
we describe a hybrid design, which combines assays on individual speci- 
mens and assays on pooled specimens, when assays are subject to a LLOD. 
Suppose we have uncorrelated specimens {Xs,s = 1, . . . , N}, and we can 
perform only n assays. Let a be the proportion of n that are assays of in- 
dividual specimens randomly sampled from all individual specimens. When 
a = 1, only n of the specimens are used for a simple random sampling 
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design. We measure an individual specimens {Xs,s = 1, . . . , an} and use the 
remaining N — an individual specimens {Xs,s = an + 1, . . . , N} to create 

{l — a)n pooled specimens {X^-^\i = 1, . . . , {l — a)n}. Here we use subscript i 
to indicate assays. Ideally we would obtain pooled measurements 

ip+an 
s=(i— l)p+an+l 

where p is pooling group size, p = [ ^Ia)n ] • Here [x] is the integer round of 
a quantity x. When an = n — 1, we have one-pool design with n — 1 individual 

assays {Xi,i = 1, . . . ,n — 1} and 1 pooled assay {xj^^}. aonc-pooi = 1 ~ ^ 
the maximum of a under hybrid design. 

In this article we study the hybrid design in a realistic scenario where as- 
says have measurement error and pooling error as well as subject to a LLOD. 
A simple two-assay hybrid design composed of an individual assay group 
and a pooled assay group is not enough to estimate both measurement error 
and pooling error. We can apply two approaches to estimate both errors: 
(1) three-assay hybrid design and (2) two-assay hybrid design with repli- 
cates. 

2.1. Three-assay hybrid design. A three-assay hybrid design consists of 
three different groups, an individual group Z^^\ a pooled group of 
pooling group size pi, and a pooled group Z^^^^ of pooling group size p2- 
Let a be the fraction of assays that are individual assays, and /3 the fraction 
of assays that are second pooled assays with pooling size p2- The numbers 
of the assays in each group are ni = an, np^ = {1 — a — /3)n, and n^j = /3n, 
respectively. The total number of the specimens are = an + (1 — a — 
/3)npi + /3np2- Given (3 and p2, we can obtain pi = [ ^^^"^I^"^^ ] ■ Due to the 
LLOD, each observation takes the following forms: 

f Xf"') + jiw)e^^ + et^ , ) + jiw)e^^ + e^^ > LLOD, 

I N/A, ) + 7(w^)ef + ef"'^ < LLOD, 

where w = l,pi,p2 {pi P2, since the three-assay design reduces to the 
two-assay design when Pi=P2), « = 1, • • • , X^^^ are the individual speci- 
mens, e^"*^ is measurement error, e^^^ is pooling error, and 7(w) is a known 
function such that 7(1) =0. For simplicity, we assume 7(pi) = 7(^2) = 1- 
When [3 = 0, three-assay design reduces to two-assay design. When an = 
n — 1 — /3n, we have one-pool design with n — 1 — (3n individual assays 

{Xi,i = — 1 — /3n}, 1 pooled assay with pooling size pi, 

and /3n pooled assays {x\^^\i = 1, . . . ,/3n} with pooling size p2- We have 
Q^one-pooi — ^ ~ P ~ }{• When /3 = 0, three-assay design reduces to two-assay 
design, that is, aono-pooi = 1 - ^• 
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2.2. Two-assay design with replicates. Another approach to estimate 
poohng and measurement errors is two-assay design with rephcates. In prac- 
tice, laboratories often measure the assays twice. When a specimen is mea- 
sured twice, for individual samples, we have 

7(1) — Y j_ 7(1) — Y j_ Jy^) 

where and are measured values, X is the true value, and e^^™^ 

and e^™^ are measurement errors. In practice, laboratories often use the 
average of Z}^ and ZI2 as the true biomarker value. We also have 

(1) i^z? = z^-z^ = ^-4^. 

By fitting the distribution of /S.Z^\ we can obtain the parameter for mea- 
surement error e^'^\ For pooled assays, we have 

^ii -Xi + e^ + , z -2 - Aj + e -2 + 6^2 , 

iv) (p) 

where and e^2 pooling errors. We also have 

(2) AZ(^) = Zif) - = (eS"^) + e?)) - (eS2™) + ej^)). 

(p) 

By fitting the distribution of AZ^ , we can obtain the parameter for the 
sum of measurement error and pooling error e*-™"^ + e^^\ After we obtain the 
estimates of the pooling and measurement errors, we can use a two-assay 
design involving one individual sampling group and only one pooling group 
to estimate the parameters of the biomarker. 

2.3. Maximum likelihood estimate. The literature on limit of detection 
is largely maximum likelihood (ML) due to a need to assume a distribution 
for the data that are unmeasurable below the limit of detection. For insight 
below the limit of detection, the distribution above is assessed and assumed 
consistent below. ML estimation follows naturally after this. One simple way 
to address the LLOD is to substitute a replacement value for unobservable 
data. However, it will lead to biased assessment and it has been shown that 
the best value is often E[X\X < d] and required the same assumption on 
the distribution below the limit of detection. In this article, we use the ML 
method to handle LLOD data because it yields asymptotically unbiased 
estimates of the parameters [Gupta (1952), Chapman (1956)]. We consider 
two cases: (1) normally distributed biomarker with normally distributed 
measurement error and pooling error, and (2) Gamma distributed biomarker 
with double exponentially distributed measurement error and pooling error. 

2.3.1. Normal distributed biomarker and errors. Let the individual bio- 
marker values be independently and identically distributed as follows: 

Xi^^ N{iix,al), i = l,...,an, 
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where a € (0, 1). By applying the pooHng design based on — an assays, 
ideally we would obtain pooled measurements following normal distribution 

xf^^-iV^A^x,^), i = l,...,(l-a)n. 

We assume that the measurement error e^"^^ and pooling error e^^ also 
follow independent normal distribution 

e^^-NiOyj, ef^NiO,al), ^ = l,...,n. 

The detailed likelihood function is available in the supplementary material 
[Schisterman et al. (2011), Section 1]. 

2.3.2. Gamma distributed biomarker and double exponentially distributed 
errors. In certain situations, the distribution of the biomarker values is 
skewed, and the normality assumptions cannot be applied. In these circum- 
stances, the Gamma distribution is a reasonable alternative. Furthermore, 
the distribution of measurement and pooling errors can vary by shape, and 
the normality assumptions are not always reasonable. In these cases, dou- 
ble exponential distribution might be appropriate, because it is symmetric 
and mean zero. Suppose that the individual biomarker Xi follows a Gamma 
distribution 

Xi ~ q{x\a,b) = , J", , e~^/^x°~^, z = 1, . . . ,an. 

For pooled assays with pooling size p, using the additive property of the 
Gamma distribution, we have 

Xf' ~ 5f(x; ap, h/p), i = 1, . . . , (1 - a)n, 

and the measurement error and pooling error follow a double exponential 
distribution with scale parameters c and d, respectively, 

^ h{x- c) = ^e-l-l/^ ef^ ~ h{x- d) = ^eH^I/^ i = 1, . . . , n. 

The detailed likelihood function is available in the supplementary material 
[Schisterman et al. (2011), Section 2]. 

2.4. Evaluation. In this section we evaluate three cases: (1) normally 
distributed biomarker with negligible measurement error and pooling error 
under two-assay design, (2) normally distributed biomarker with normally 
distributed measurement error and pooling error under three-assay design, 
and (3) Gamma distributed biomarker and double exponentially distributed 
measurement error and pooling error under two-assay design. 
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2.4.1. Normal case with negligible pooling error and measurement error. 
We are interested in the one-pool design, a special case of the hybrid design, 
because it is simple and easily executed in practice. The one-pool design 
fixes the an = n — 1 individual sampling group, leaving (1 — a)n = 1 of the 
remaining N — (n — l) specimens. We first use a simple case with negligible 
pooling error and measurement error to illustrate the efficiency of the one- 
pool design. 

When random sampling and pooling are combined in the hybrid de- 
sign, the data consist of individual and pooled observations {z[^\ . . . , Z^^^-^, 
z[^\ . . . , Z^^^_^^^-^} . If we assume that the measurement error and pooling 

error are negligible, that is, e^*"-* = and e^^-* = 0, the three-assay design is 
reduced to a two-assay design (/3 = 0). Each observation takes the form 



The log-likelihood function for normal distribution is a function of only pa- 
rameters fix and ax- To calculate the MLEs of parameters fix and ax, we 
solve the system of log-likelihood first derivative equations {^f- = 0, = 0}. 
Expressions for the log-likelihood equations and the entries of Fisher infor- 
mation matrix I can be found in the supplementary material [Schisterman 
et al. (2011), Section 3]. The asymptotic variances of the estimators can 
be analyzed with respect to a (the proportion of assays that are individual 
assays), and an a that minimizes the variance of an MLE can be proposed. 

Figure 1 illustrates the asymptotic variances Var(/i^) and Var((Ta,) ver- 
sus a for LLOD = -5,-0.5,-0.1,0,0.01,0.04,0.1,0.3 and 0.5 from bottom 
to top with = 1,000, n = 100, fJ-x = and ax = 1- Note that the rightmost 
point is at a = (n — l)/n, that is, one-pool design, rather than q = 1. 

When LLOD is negligible (e.g., LLOD = —5), Va,r:{flx) is approximately 
constant for a < 1 in Figure 1(a). Yar{fix) increases with the increase of 
LLOD. For LLOD < fix, Var(/X3;) decreases as a increases, for example, 
LLOD = fix (i.e., 0) and fix — O.lax (i.e., —0.1). Var{fj,x) takes the minimum 
at CKone-pool = {n — l)/n = 0.99. When LLOD > fix, Var:{flx) takes a minimum 
value at an < Q < aone-pooi as shown in Figure 1(a) and (b). A hybrid de- 
sign is more efficient than only measuring pooled assays or only measuring 
individual assays. When LLOD < fix, the traditional pooling design (a = 0) 
is more efficient than simple random sampling [Vexler, Liu and Schisterman 
(2006), Mumford et al. (2006)]. However, when a pooled-unpooled hybrid 
design is applicable, when LLOD < fix and the objective is the estimate fix, 
we recommend a one-pool design given that pooling and measurement errors 
are negligible. However, when is very large, pooling N — (n — l) specimens 
might exceed the laboratory limitations. 




Xl""' > LLOD, 
Xf""^ < LLOD. 
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Fig. 1. nYa,T{fix) and n'Va,r{ax) versus the proportion of individual assays to 
the measured assays a in the absence of measurement and pooling errors with 
LLOD= -5,-0.5,-0.1, 0, 0.01, 0.04, 0.1, 0.3 and 0.5 from bottom to top; = 1,000, 
n = 100, = and = 1. 



Figure 1(c) shows Var(cj^) is approximately constant as well when LLOD 
is absent (e.g., LLOD = —5). For LLOD < (e.g., LLOD = fix — 0.5cj^), 
pool design {a = 0) minimizes Var((5"x). For LLOD > Hx (e.g., 0, 0.01, 0.04, 
0.1, 0.3 and 0.5), Ya,i{ax) takes the minimum when the one-pool design is 
used, as shown in Figure 1(c) and (d). 

The traditional pooling design involves obtaining n pooled assays with 
pooling group size p = N/n. With this design, the variance of the /x^- 
estimator based on n measurements of the pooled assays is o"^/A^. For one- 
pool design with pooling group size p = N — n + 1, when the LLOD is not in 
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effect, tlie MLE of fix based on tlie combined data {z[^\ . . . , zl^^^,z[^ 
is the following: 



n — \+ p 



,i=l 



1 f"^^ / ^ 

I s=l \s=n 

Thus, the one-pool design {^|^\ . . . , z'^}^^, zj^ ^Hows estimation of fix- 

\ai{fix) is equivalent to that based on traditionally pooled data {z[^^^\ . . . , 
^^7V/n)|^ This variance is not equivalent to that based on a simple random 

sample of individual assays {z[^\ . . . , Zn^}. The same conclusion can be 
shown regarding the cj^-estimation. This proposed one-pool design is easier 
to execute than traditional pooling. Moreover, if the parametric assumptions 

regarding the sample distribution are rejected, the data {z[^\ . . . , Z^^^^, 

^{N n+i)| easily be used to estimate the unknown distribution, whereas 

reconstruction of the distribution function of X based on {z[^^^^ , . . . , zi^^"^^} 
is a very complicated problem [Vexler, Schisterman and Liu (2008)]. Even 
when the LLOD has a role, namely, when LLOD < fix, as in Figure 1(a), we 
can suggest the simple one-pool design. 

2.4.2. Normal case with nonnegligible measurement error and pooling er- 
ror. When pooling error and measurement error are nonnegligible, one ap- 
proach to estimating the pooling and measurement errors is a three-assay 
design, as mentioned at the beginning of this section. The expressions for 
the normally distributed log-likelihood equations and the entries of Fisher 
information matrix I can be found in the supplementary material [Schister- 
man et al. (2011), Section 4]. Figure 2 depicts the evolutions of nYav{fix), 
nVar{ax), n Var((Tp) and n Var((Tm) with = 1,000, n = 100, ax = I, ap = 
0.3 and am = 0.4. The curves from bottom to top are for LLOD = —5, —0.5, 
and 0.5, respectively. Because our hybrid design involves two pooling groups, 
we set the proportion of the second pooling group f3 = 0.4 and pooling size 
P2 = 5. Note that the rightmost point q = [(1 — f3)n — l]/n = 0.59 is corre- 
sponding to the one-pool design that consists of (1 — /3)n — 1 = 59 individual 
assays, 1 pooled assay with pooling size pi = 741, and /3n = 40 pooled assays 
with pooling size P2 = 5. 

As LLOD increases, Vaic{flx), Vav{ax), Var((5"m) and Var((Tp) increase. 
YaT^fix) increases as a increases, that is, the pooled design minimizes Yar{p,x). 
Var(o"a;), Var((Tm) and Var((5"p) obtain the minimum under the hybrid design. 
We provide R code as the supplementary material [Schisterman et al. (2011)] 
to calculate Yar{jlx), Var{ax), Var{am) and Var((Tp). 
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Fig. 2. nYa,T{fix) , n'VaT{ax) , nVar((Tm), and nVar(CTp) versus the proportion of indi- 
vidual assays to the measured assays a in the presence of measurement and pooling errors 
under three-assay design with LLOD— —5, —0.5,0 and 0.5 from bottom to top; the propor- 
tion of the second pooled assays (5 = 0.4, pooling size p2 = 5, N = 1,000, n = 100, Hx = 0, 
(Ta; = 1, am = 0.3 and ap = 0.4. 



2.4.3. Gamma case with measurement error and pooling error. In this 
subsection we study the situation with Gamma distributed biomarker, dou- 
ble exponentiaUy distributed measurement error and poohng error by Monte 
Carlo simulation. Two-assay design can be used when we know the variances 
of measurement error and pooling error. The parameters for the Gamma 
distributed biomarker are a = 1.5 and 6 = 0.1. So the mean of the individ- 
ual biomarker is E{X) = ab = 0.15, and the variance Var(X) = ab^ = 0.015. 
The parameters for double exponentially distributed measurement error and 
pooling error are c = 0.02 and d = 0.03, respectively. Both errors are mean 
zero and the variance of measurement error is Var(e(™'^) = 2c^ = 0.0008, and 
the variance of pooling error Var(e(P)) = Id"^ = 0.00018. The number of spec- 
imens is = 1,000 and the number of assays is n = 100. 1,000 simulations 
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Fig. 3. nVar(a) and ?iVar(&) versus the proportion of individual assays to the mea- 
sured assays a for simulated Gamma distributed biomarkers with double exponentially 
distributed measurement and pooling errors with N = 1,000, n = 100, a — 1.5, b = 0.1, 
c = 0.02, d = 0.03, iLOZ) = 0.02, 0.05, 0.1 and 0.15. 

were performed to evaluate Var(a) and Var(6) at a = 0, 0.2, 0.4, 0.6, 0.8 and 
0.99, subject to LLOD = 0.02, 0.05, 0.1 and 0.15. 

The simulation results are presented in Figure 3. Var(a) and Var(6) in- 
creases with the increase of LLOD. Both Var(d) and Var(6) decrease with 
the increase of a. They are minimized under the one-pool design (q = 0.99). 
When LLOD < E{X), Var(a) does not change much with the increase of a. 
However, when LLOD = E{X), Var(a) becomes significantly larger, espe- 
cially when a is small. It is five-fold larger than with other LLOD values for 
pool design (a = 0). Bias(a) and Bias(6) for finite sample size are presented 
in Section 5 of the supplementary material [Schisterman et al. (2011)]. They 
are relatively small except for large LLOD, for example, LLOD = 0.15 (61% 
missing for individual sampling). 

3. Application. 

3.1. Normally distributed biomarker with negligible measurement and pool- 
ing errors. In order to investigate the efficiency of the hybrid design, we 
bootstrapped by using real data from a study of biomarkers of coronary 
heart disease. In this study, cholesterol level, a biomarker for coronary heart 
disease, was measured for 40 individuals that had a normal rest electrocar- 
diogram, were free of symptoms, and had no previous cardiovascular proce- 
dures or myocardial infarctions. The mean of the individual biomarker assays 
is 205.53 mg/dl and the standard deviation is 42.29 mg/dl. The Shapiro- 
Wilk test for normality suggests that the individual assays follow a normal 
distribution. 

We assume that we have = 40 specimens, we can only afford to perform 
n = 20 assays, and the measurement error and pooling error are negligible. 
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Table 1 

Parameters used for normally distributed biomarker ignoring errors with number of 
samples N = 40 and the number of assays n = 20 








0.5 


0.75 


0.8 


0.9 


0.95 


Number of individual assays 





10 


15 


16 


18 


19 


Number of pooled assays 


20 


10 


5 


4 


2 


1 


Pooling size p 


2 


3 


5 


6 


11 


21 



Artificial LLOD = 0, 150, 170, 180, 200, 205 and 210 are applied to the choles- 
terol data. We evaluated six designs, involving a values from Table 1. The 
rightmost one (a = 0.95) is a one-pool design. To generate the pooled data 
with different pooling size p, we pooled the individual assays together, and 
used the average values as the measured values of the pooled assays. Then we 
combined the unpooled and simulated pooled data, and applied the method- 
ology for two-assay design with negligible measurement and pooling error 
case in Section 2.4.1 to calculate the maximum likelihood estimate of fix- 
This procedure is repeated 100,000 times to obtain Var{flx)- 

The results are shown in Figure 4. When LLOD < fix — (e.g., and 
150), Yax{jlx) is approximately a constant. When fix — d'x < LLOD < fix 
(e.g., 170 and 180), Var(/za:) decreases as a increases. The minimum is ob- 
tained under the one-pool design. When LLOD is close to fix (e.g., 200 and 
205), Var(/ij,.) takes the minimum at < a < 1. A hybrid design is favor- 
able. Although the one-pool design does not give the minimum, Yai{jlx) for 
the one-pool design (78.2 for LLOD = 200) is close to the minimum (68.7). 
Due to the simplicity of design, one-pool design can be recommended. When 
LLOD > fix (e.g., 210), Var{fix) increases as a increases. The maximum of 
Var^jlx) is obtained under one-pool design. 
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Fig. 4. Var(/ia;) versus the proportion of individual assays to the measured assays a by 
bootstrapping with N = 40, n = 20, LLOD = 0, 150, 170, 180, 200, 205 and 210. 
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3.2. Gamma distributed hiomarker with double exponentially distributed 
measurement error and pooling error. In this subsection we exemplified the 
two-assay design with rephcates using real data from a study of chemokine 
biomarker monocyte chemotactic protein-1 (MCP-1). MCP-1 plays a role in 
a variety of pathological conditions such as inflammatory and immune reac- 
tions. Assays are measured in different plates. Each plate has its own LLOD. 
In this article we use only the data from the plates with LLOD = 0.016, be- 
cause our model requires the same LLOD. Each plate was measured twice. 
There are 99 individual sampling assays, and 45 pooled assays with p = 2. 
The mean of the individual sampling assays is 0.189, and the standard devi- 
ation is 0.183. The measurement errors can be calculated by the difference 
of individual sampling assays [see (1)], and the pooling errors can be cal- 
culated by the difference of pooled assays; see (2). We used the R package 
VGAM [Yee (2010)] to fit the difference of individual rephcates AZ(i) to 
obtain the estimate of parameter c. Then we fit the difference of pooled 
replicates AZ^^\ which follows a double exponential distribution with pa- 
rameter e. The estimated variances of measurement error and pooling error 
can be obtained by 

(^) _ Va:r(AZ(i)) 



Var(e^"^^) 
Va^(e(P)) 



(p) _ Var(AZ(p)) - Var(AZ(i)) 



2 

After we obtained the estimates of the variances of pooling error and mea- 
surement error, we used one individual sampling group and one pooling 
group to estimate the other parameters, for example, a and b of the Gamma 
distributed biomarker. 

The histograms of individual biomarker Z^^\ difference of measurement 
error e^^ — Cj"^^ and difference of the sum of measurement error and pool- 
ing error (e^™^ -|- e^^'*) — {e^^ + ^2'^) ^re illustrated in Figure 5. The fitting 
curves are generated by the parameters estimated by the R package VGAM. 
The estimated parameters are presented in Table 2. For double exponential 
distribution, the estimated variance is 2s^, where s is the scale parameter 
of double exponential distribution. The estimated Var(e^"^^) and Var(e''^^) 
are presented in Table 2 as well as their corresponding scale parameters. For 
Gamma distribution, the estimated mean is ab and the estimated variance 
is ab^. Table 2 shows that the sample variances are very close to the esti- 
mated variances. The fitting curves in Figure 5 fit the histogram quite well. 

For fixed N and re, we need to vary the pooling size p to vary a. However, 
we only have individual unpooled data and pooled data with pooling size 
p = 2. So we pool the p = 2 pooled assays together to generate the data with 
different pooling size. Because we want to include the measurement error 
and pooling error in the pooled assays, we used pooled assays rather than 
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Fig. 5. Histograms of individual biomarker Z^^\ difference of measurement er- 
and difference of the sum of measurement error and pooling error 
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individual sampling assays to generate pooled assays with different pooling 
size. For example, 

_ 1 / X1+X2 (p) (m) X3+X4 (p) (m) 

= \iX^+X, + Xs + X,) + l(e?) + eg)) + i(e(-) + et\ 
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Table 2 

The estimates of the parameters for individual biomarker Z^^\ difference of 
measurement error Cj™' — e'™', difference of the sum of measurement error and pooling 
error (e^*"' +e^''') — (e^™' +e'2^), measurement error e*-™' and pooling error e'-^K Here a 
and b are the shape and scale parameters of the Gamma distribution, respectively, s are 
the scale parameters of double exponential distribution 

Mean Variance 



a b s Estimated Sample Estimated Sample 





1.54 0.12 


0.189 


0.189 


0.023 


0.034 




0.033 





-0.0034 


0.0022 


0.0029 


1 p(P) „(>") 


0.050 





0.012 


0.0051 


0.0059 




0.023 






0.0011 






0.027 






0.0015 





Then we combined individual unpooled data, and measured {p = 2) or sim- 
ulated {p > 2) pooled data to generate a hybrid design. The pooling sizes 
we used are presented in Table 3. We assume that we have = 79 or 80 
specimens, and can only afford to perform n = 40 assays. Besides the true 
LLOD = 0.016, additional LLOD = 0.05, 0.1 and 0.15 are applied to evaluate 
the influence of LLOD. 

The results are illustrated in Figure 6. As a increases, Var(a) increases 
then decreases at the one-pool design (a = 0.975). One-pool design gives 
the second minimum. This tendency is different from the simulation result, 
where Var(a) decreases as a increases, and the minimum is reached under 
one-pool design. When LLOD is very small (i.e., LLOD = 0.016), Var(a) 
does not change much. Var(6) decreases as a increases, which is consistent 
with the simulation result. 

4. Summary and discussion. Although the pooling design can increase 
the efficiency of estimation from data subject to a LLOD, there are situations 
when the pooling design strongly aggravates the detection limit problem. 



Table 3 

Parameters used for the Gamma distributed biomarker with double exponentially 
distributed errors and the number of assays n = 40 



a 





0.675 


0.8 


0.925 


0.975 


Number of individual assays 





27 


32 


37 


39 


Number of pooled assays 


40 


13 


8 


3 


1 


Pooling size p 


2 


4 


6 


14 


40 


Number of samples A*' 


80 


79 


80 


79 


79 
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Fig. 6. 7iVar(a) and n'Va,r{b) versus the proportion of individual assays to the measured 
assays a by bootstrapping with N = 79 or 80, n = 40, iOD = 0.016, 0.05, 0.1 and 0.15. 

A hybrid design was proposed in order to gain benefits from botli individual 
assays and pooled assays [Schisterman et al. (2010)]. 

In this article we present methodology for determining a hybrid design 
that most efficiently estimates parameters from data subject to measurement 
error, pooling error and a limit of detection. Efficiency is gauged by the vari- 
ance of a maximum likelihood estimator of a parameter. We demonstrated 
the asymptotic MLE variances as functions of the proportion of individual 
assays to the measured assays. To estimate both measurement error and 
pooling error, a three- assay design or a two- assay design with replicates 
is needed. We examined two cases: one is with the normally distributed 
biomarker and errors, the other is with the Gamma distributed biomarker 
and double exponentially distributed errors. 

Under the condition that we have A'^ specimens and we can only perform 
n < N assays, we evaluated the efficiency of the one-pool hybrid design, 
which involves assaying n — 1 individual specimens and one pooled sample 
of the remaining N — (n — 1) individual specimens. When measurement error 
and pooling error are negligible, for the normally distributed biomarker, one- 
pool design minimizes Var{fix) for LLOD < and Yar{ax) for LLOD > fix- 
When measurement error and pooling error are in effect, the pooled design 
minimizes Y&i{fLx), while the hybrid design minimize Var((5"x), Var((5"m,) and 
Var(o"p). The a value corresponding to the minimum can be obtained by 
the R code that we provided as the supplementary material [Schisterman 
et al. (2011)]. Note that, in practice, our interest is in fix, (^x, and not 
in dp or am- The simulation result shows that it minimizes both Var(a) 
and Var(6) for Gamma distribution under complex measurement error and 
pooling error assumptions. Hence, under the circumstances described above, 
when one seeks to avoid more complicated procedures for determining and 
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executing a potentially more efficient hybrid design, the one-pool hybrid 
design is an efficient and easily implemented alternative to a simple random 
sample of individual assays. 
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SUPPLEMENTARY MATERIAL 

R code and detailed derivations (DOI: 10.1214/11-AOAS490SUPP; .pdf). 
R code used to calculate n Var(/ij;), nVar(o"a;), n Var((Jm) and n Var((Tp). De- 
tailed derivation of maximum likelihood estimates and the Fisher informa- 
tion matrix. 
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